This paper introduces Mini-Gemini, a framework to enhance vision-language models like GPT-4 and Gemini. It improves performance and expands capabilities in image understanding, reasoning, and generation. Key aspects include efficient high-resolution visual tokens, high-quality training data, and integration with generative models.